Heritability is a statistic used in the fields of Animal husbandry and genetics that estimates the degree of variation in a phenotypic trait in a population that is due to genetic variation between individuals in that population. The concept of heritability can be expressed in the form of the following question: "What is the proportion of the variation in a given trait within a population that is not explained by the environment or random chance?"
Other causes of measured variation in a trait are characterized as environmental factors, including observational error. In human studies of heritability these are often apportioned into factors from "shared environment" and "non-shared environment" based on whether they tend to result in persons brought up in the same household being more or less similar to persons who were not.
Heritability is estimated by comparing individual phenotypic variation among related individuals in a population, by examining the association between individual phenotype and genotype data, or even by modeling summary-level data from genome-wide association studies (GWAS). Heritability is an important concept in quantitative genetics, particularly in selective breeding and behavior genetics (for instance, twin study). It is the source of much confusion because its technical definition is different from its commonly-understood folk definition. Therefore, its use conveys the incorrect impression that behavioral traits are "inherited" or specifically passed down through the genes. Behavioral geneticists also conduct heritability analyses based on the assumption that genes and environments contribute in a separate, additive manner to behavioral traits.
The extent of dependence of phenotype on environment can also be a function of the genes involved. Matters of heritability are complicated because genes may canalize a phenotype, making its expression almost inevitable in all occurring environments. Individuals with the same genotype can also exhibit different phenotypes through a mechanism called phenotypic plasticity, which makes heritability difficult to measure in some cases. Recent insights in molecular biology have identified changes in transcriptional activity of individual genes associated with environmental changes. However, there are a large number of genes whose transcription is not affected by the environment.
Estimates of heritability use statistical analyses to help to identify the causes of differences between individuals. Since heritability is concerned with variance, it is necessarily an account of the differences between individuals in a population. Heritability can be univariate – examining a single trait – or multivariate – examining the genetic and environmental associations between multiple traits at once. This allows a test of the genetic overlap between different phenotypes: for instance hair color and eye color. Environment and genetics may also interact, and heritability analyses can test for and examine these interactions (GxE models).
A prerequisite for heritability analyses is that there is some population variation to account for. This last point highlights the fact that heritability cannot take into account the effect of factors which are invariant in the population. Factors may be invariant if they are absent and do not exist in the population, such as no one having access to a particular antibiotic, or because they are omnipresent, like if everyone is drinking coffee. In practice, all human behavioral traits vary and almost all traits show some heritability.
H2 is the broad-sense heritability. This reflects all the genetic contributions to a population's phenotypic variance including additive, dominant, and epistasis (multi-genic interactions), as well as maternal effect, where individuals are directly affected by their parents' phenotype, such as with milk production in mammals.
A particularly important component of the genetic variance is the additive variance, Var(A), which is the variance due to the average effects (additive effects) of the . Since each parent passes a single allele per locus to each offspring, parent-offspring resemblance depends upon the average effect of single alleles. Additive variance represents, therefore, the genetic component of variance responsible for parent-offspring resemblance. The additive genetic portion of the phenotypic variance is known as Narrow-sense heritability and is defined as
An upper case H2 is used to denote broad sense, and lower case h2 for narrow sense.
For traits which are not continuous but dichotomous such as an additional toe or certain diseases, the contribution of the various alleles can be considered to be a sum, which past a threshold, manifests itself as the trait, giving the liability threshold model in which heritability can be estimated and selection modeled.
Additive variance is important for selection. If a selective pressure such as improving livestock is exerted, the response of the trait is directly related to narrow-sense heritability. The mean of the trait will increase in the next generation as a function of how much the mean of the selected parents differs from the mean of the population from which the selected parents were chosen. The observed response to selection leads to an estimate of the narrow-sense heritability (called realized heritability). This is the principle underlying artificial selection or breeding.
The number of B alleles can be 0, 1, or 2. For any genotype, ( Bi, Bj), where Bi and Bj are either 0 or 1, the expected phenotype can then be written as the sum of the overall mean, a linear effect, and a dominance deviation (one can think of the dominance term as an interaction between Bi and Bj):
The additive genetic variance at this locus is the Weighted mean of the squares of the additive effects:
where
There is a similar relationship for the variance of dominance deviations:
where
The linear regression of phenotype on genotype is shown in Figure 1.
In non-human populations it is often possible to collect information in a controlled way. For example, among farm animals it is easy to arrange for a bull to produce offspring from a large number of cows and to control environments. Such experimental control is generally not possible when gathering human data, relying on naturally occurring relationships and environments.
In classical quantitative genetics, there were two schools of thought regarding estimation of heritability.
One school of thought was developed by Sewall Wright at The University of Chicago, and further popularized by C. C. Li (University of Chicago) and J. L. Lush (Iowa State University). It is based on the analysis of correlations and, by extension, regression. Path Analysis was developed by Sewall Wright as a way of estimating heritability.
The second was originally developed by Ronald Fisher and expanded at The University of Edinburgh, Iowa State University, and North Carolina State University, as well as other schools. It is based on the analysis of variance of breeding studies, using the intraclass correlation of relatives. Various methods of estimating components of variance (and, hence, heritability) from ANOVA are used in these analyses.
Today, heritability can be estimated from general pedigrees using linear mixed models and from genomic relatedness estimated from genetic markers.
Studies of human heritability often utilize adoption study designs, often with Twin who have been separated early in life and raised in different environments. Such individuals have identical genotypes and can be used to separate the effects of genotype and environment. A limit of this design is the common prenatal environment and the relatively low numbers of twins reared apart. A second and more common design is the twin study in which the similarity of identical and fraternal twins is used to estimate heritability. These studies can be limited by the fact that identical twins are not completely genetically identical, potentially resulting in an underestimation of heritability.
In observational studies, or because of evocative effects (where a genome evokes environments by its effect on them), G and E may covary: gene environment correlation. Depending on the methods used to estimate heritability, correlations between genetic factors and shared or non-shared environments may or may not be confounded with heritability.
where r can be thought of as the coefficient of relatedness, b is the coefficient of regression and t is the coefficient of correlation.
The effect of shared environment, c2, contributes to similarity between siblings due to the commonality of the environment they are raised in. Shared environment is approximated by the DZ correlation minus half heritability, which is the degree to which DZ twins share the same genes, c2=DZ-1/2 h2. Unique environmental variance, e2, reflects the degree to which identical twins raised together are dissimilar, e2=1-r(MZ).
where is the effect of genotype Gi and is the environmental effect.
Consider an experiment with a group of sires and their progeny from random dams. Since the progeny get half of their genes from the father and half from their (random) mother, the progeny equation is
The second group of progeny are comparisons of means of half sibs with each other (called among sire group). In addition to the error term as in the within sire groups, we have an addition term due to the differences among different means of half sibs. The intraclass correlation is
+ Table 1: ANOVA for Sire experiment ! Source ! d.f. ! Mean Square ! Expected Mean Square | |||
Between sire groups | |||
Within sire groups |
The term is the intraclass correlation between half sibs. We can easily calculate . The expected mean square is calculated from the relationship of the individuals (progeny within a sire are all half-sibs, for example), and an understanding of intraclass correlations.
The use of ANOVA to calculate heritability often fails to account for the presence of gene–-environment interactions, because ANOVA has a much lower statistical power for testing for interaction effects than for direct effects.
where
is the additive effect of the ith allele, is the additive effect of the jth allele, is the dominance deviation for the ijth genotype, and is the environment.
Experiments can be run with a similar setup to the one given in Table 1. Using different relationship groups, we can evaluate different intraclass correlations. Using as the additive genetic variance and as the dominance deviation variance, intraclass correlations become of these parameters. In general,
where and are found as
P, and
P.
Some common relationships and their coefficients are given in Table 2.
+ Table 2: Coefficients for calculating variance components ! Relationship ! ! | ||
Identical Twins | ||
Parent-Offspring | ||
Half Siblings | ||
Full Siblings | ||
First Cousins | ||
Double First Cousins |
When a large, complex pedigree or another aforementioned type of data is available, heritability and other quantitative genetic parameters can be estimated by restricted maximum likelihood (REML) or Bayesian methods. The raw data will usually have three or more data points for each individual: a code for the sire, a code for the dam and one or several trait values. Different trait values may be for different traits or for different time points of measurement.
The currently popular methodology relies on high degrees of certainty over the identities of the sire and dam; it is not common to treat the sire identity probabilistically. This is not usually a problem, since the methodology is rarely applied to wild populations (although it has been used for several wild ungulate and bird populations), and sires are invariably known with a very high degree of certainty in breeding programmes. There are also algorithms that account for uncertain paternity.
The pedigrees can be viewed using programs such as Pedigree Viewer [1], and analyzed with programs such as ASReml, VCE [2], WOMBAT [3], MCMCglmm within the R environment [4] or the BLUPF90 family of programs [5].
Pedigree models are helpful for untangling confounds such as reverse causality, maternal effects such as the prenatal environment, and confounding of genetic dominance, shared environment, and maternal gene effects.
In this equation, the Response to Selection (R) is defined as the realized average difference between the parent generation and the next generation, and the Selection Differential (S) is defined as the average difference between the parent generation and the selected parents.
For example, imagine that a plant breeder is involved in a selective breeding project with the aim of increasing the number of kernels per ear of corn. For the sake of argument, let us assume that the average ear of corn in the parent generation has 100 kernels. Let us also assume that the selected parents produce corn with an average of 120 kernels per ear. If h2 equals 0.5, then the next generation will produce corn with an average of 0.5(120-100) = 10 additional kernels per ear. Therefore, the total number of kernels per ear of corn will equal, on average, 110.
Observing the response to selection in an artificial selection experiment will allow calculation of realized heritability as in Fig. 4.
Heritability in the above equation is equal to the ratio only if the genotype and the environmental noise follow Gaussian distributions.
The controversy over heritability estimates is largely via their basis in twin studies. The scarce success of molecular-genetic studies to corroborate such population-genetic studies' conclusions is the missing heritability problem. Eric Turkheimer has argued that newer molecular methods have vindicated the conventional interpretation of twin studies, although it remains mostly unclear how to explain the relations between genes and behaviors. According to Turkheimer, both genes and environment are heritable, genetic contribution varies by environment, and a focus on heritability distracts from other important factors. Overall, however, heritability is a concept widely applicable.
See also
Further reading
External links
|
|